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By generating the specifics of a network structure only when needed (on-the-fly), we derive a simple stochas- 
tic process that exactly models the time evolution of susceptible-infectious dynamics on finite-size networks. 
The small number of dynamical variables of this birth-death Markov process greatly simplifies analytical cal- 
culations. We show how a dual analytical description, treating large scale epidemics with a Gaussian approxi- 
mations and small outbreaks with a branching process, provides an accurate approximation of the distribution 
even for rather small networks. The approach also offers important computational advantages and generalizes 
to a vast class of systems. 



I. INTRODUCTION 

Real-world systems are often composed of numerous in- 
teracting elements. Complex network models prove to be 
valuable tools for systems where interactions are neither com- 
pletely random nor completely regular |[T]|2l- Among these 
systems, an important subclass concerns the propagation of 
something through interactions among the constituting ele- 
ments. Examples include spreading of infectious diseases in 
populations |3-8| as well as propagation of information |9- 
[TlJ, rumors | , 12.-14 J or viral marketing 1.15. ,161 on social net- 
works. We will hereafter call infection whatever is propagat- 
ing. 

Some modeling approaches are known to exactly reproduce 
the behavior of propagation on networks in specific limiting 
cases. For example, branching processes iflTl [TSl may ex- 
actly predict the probability distribution for the final state of 
a system of infinite size. Similarly, heterogeneous mean field 
models 1 19 , 20 1 may exactly predict the time evolution of rel- 
evant mean values for an infinite system that is annealed (i.e., 
its structure changes at a rate arbitrarily faster than the prop- 
agation process). Finally, exact models are also possible for 
very specific network structures, e.g., a linear chain [21]. 

In this article, we present a stochastic process that exactly 
reproduces a propagation dynamics on quenched (fixed struc- 
ture) configuration model networks of arbitrary size allow- 
ing for repeated links and self-loops (to be defined shortly). 
Section [n] defines the problem at hand then presents our ap- 
proach by comparing it to a computer simulation algorithm 
which does not require a "network building" phase. However, 
this perspective is much more than an algorithmic trick sav- 
ing computer resources: it changes a problem of propagation 
on a network into a Markov birth-death process, a momentous 
difference from an analytical point of view. In Sec.|lllj we as- 
sume a large system size and obtain analytical results for both 
the asymptotic behavior of the "epidemics", where an impor- 
tant fraction of the network gets infected, and for the proba- 
bility distribution of the outbreaks, where a small number of 
nodes are affected. Our results compare advantageously to 
numerical simulations and account for finite-size effects. Fi- 
nally, we show in Sec.|lV]how this approach generalizes to a 
vast class of systems and discuss possibilities for future im- 
provements. 



II. THE EXACT MODEL 
A. Networks 

A network model uses nodes to represent the elements com- 
posing the system of interest and assigns links between each 
pair of nodes corresponding to interacting elements. Two 
nodes sharing a link are said to be neighbors and the degree 
of a node is its number of neighbors. The part of a link that 
is attached to a node is called a stub: there are two stubs per 
link and each node is attached to a number of stubs equal to 
its degree. A link with both ends leading to the same node 
is called a self-loop and repeated links occur when more than 
one link join the same pair of nodes. 

We define the configuration model (CM) f22] specified by 

the vector n = [no ni • • •] as the (microcanonical) en- 
semble of networks such that each network of this ensemble 
contains, for each k, exactly Uk nodes of degree k. Clearly, 
each network of this ensemble has the same number of nodes 
N = X]fe"fc- Since there are two stubs per link, the total 
number of stubs kn^ must be even. 

It is common practice to explicitly /orfe/t/ self-loops and re- 
peated links in CMs (CMF) since these structures are not ob- 
served in many real-world systems. However, it is often eas- 
ier to study CMs allowing for self-loops and repeated links 
(CM A). Of importance is the fact that the distinction between 
CMF and CMA vanishes for large networks (the probability 
for a link in a CMA to be a self-loop or a repeated link goes as 
N^^). The knowledge acquired on CMAs can thus be trans- 
lated to CMFs. 

A simple way to build a CM network goes as follows. (/) 
For each k £ {0, 1, . . .}, create rife nodes with k stubs. (//) 
Randomly select a pair of unmatched stubs and match them 
to form a link. Special restriction for CMFs: if a self-loop or 
repeated link is created, discard the whole network and return 
to step /. (///) Repeat until there are no unmatched stubs left. 



B. Propagation 

For the sake of demonstration, we first consider what may 
well be the simplest form of propagation on networks: the 
susceptible-infectious (SI) model. A node is said to be sus- 
ceptible if it does not carry the infection and infectious if it 



2 





(a) x.i = 22, X3 = 2, A(x) = 5. (b) x_i = 18, = 1, A(x) = 4. 



FIG. 1. Illustration of on-the-fly network construction. Susceptible 
(white circles) and infectious (gray circles) nodes each have a num- 
ber of stubs equal to their respective degree. |(a)| At some point in the 
process, three links (thin black curves) have already been assigned. 
Future dynamics does not depend on how infectious nodes are linked 
(content of the gray zone) except for the total number A(x) of unas- 
signed stubs belonging t o inf ectious nodes (stubs crossing the dashed 
border of the gray zone), (b) During any time interval \t, t+dt), there 
is probability /?A(x)dt for an event to occur. Here, after many such 
time intervals, two new links have been assigned through an event of 
type J = 3 (matching stubs A and B) and an event of type j = -1 
(matching stubs C and D). Again, other than for A(x), the future 
dynamics is not affected by how infectious nodes are linked. 



does. During an infinitesimal time interval [t,t + dt), a sus- 
ceptible node, neighbor to an infectious one, has a probability 
j3dt to acquire the infection from the latter, hence becoming 
infectious. Once infectious, a node remains in this state for- 
ever. 

For a given network structure, the following gives an algo- 
rithmic implementation of the SI model. (/) Set each node as 
either susceptible or infectious according to the initial condi- 
tions. (//) Define small time intervals and start with the first 
one. (///) For each infectious node, lookup their susceptible 
neighbors. For each of them, randomly generate a number in 
the interval [0, 1) and test if it is lower than jSdt. If yes, mark 
the corresponding node as infectious in the next time interval. 
(iv) Repeat /// for the next time interval. 

Now consider the following change to step perform 
the random number test for each neighbor of the infectious 
node, and, only when the test returns positive, verify if the 
corresponding neighbor is susceptible (if yes, mark it as in- 
fectious). This alternative algorithm is equivalent in all points 
to the original, except that the knowledge of who is the neigh- 
bor of an infectious node is not required until the very moment 
an infection may occur. Inspired by this seemingly benign ob- 
servation, we will shortly present a stochastic process, equiv- 
alent to susceptible-infectious dynamics on CMA, that does 
not require an initial network construction step. Instead, the 
network will be built on-the-fly, concurrently with the propa- 
gation. 



C. Equivalent stochastic process 

CMA networks are built by randomly matching stubs to- 
gether. In order to perform this match on-the-fly, we track the 



total number x_i of unmatched stubs. All stubs belonging to 
susceptible nodes are unmatched. Denoting Xk the number of 
susceptible nodes of degree k, the total number of unmatched 
stubs belonging to infectious nodes is then 



A(x) = x_i - kxk, 



(1) 



A:=0 



where x = [a:_i xo xi ■ ■ ■ Xk^^,^^] is the state vector. 

During the interval [t, t + dt), each of these A(x) stubs has a 
probability /3dt to infect the corresponding neighboring node 
under the condition that it is currently susceptible. Since this 
infectious stub is currently unmatched, knowing which node 
is at the other end simply requires to match it at random to one 
of the {x.i — 1) other unassigned stubs. If a susceptible stub is 
chosen, the corresponding node is immediately infected and 
no matched susceptible stubs are created. 

Since dt is infinitesimal, matching one of the A(x) stubs 
has a probability f3X{x.)dt to occur. In this case, the other stub 
selected for match has a probability [A(x) — — 1)^^ 

to also be infectious, causing no new infection. Matching the 
two stubs amounts to decrease x.i [and therefore A(x)] by 2. 
We refer to this class of events as a transition of type j = -1. 

Alternatively, there is a probability kxk{x_i — 1)^^ for 
matching the infectious stub to a stub belonging to a suscepti- 
ble node of degree k: it is marked as infectious by decreasing 
Xk by 1. Again, x_i is decreased by 2 since two stubs have 
been matched together. This kind of event is refeiTed to as a 
transition of type j — k. 

Figure [T] illustrates the Markov stochastic process defined 
by these state vectors and transition rules. One may see the 
process from the infection's perspective: until it has crossed 
a link, it has no information concerning the node at the other 
end. More formally, the master equation (notation compatible 
with 123i §7.5) 



dP{x,t) 
dt 



fcmax 

E 



[g,(x - r^ )P(x - r^ i) - g,(x)P(x,t) 



(2) 

governs the probability P(x, t) to observe state x at time t. 
For each transition type j, the function qj (x) gives the prob- 
ability rate at which this type of event occurs (given that the 
state of the system is currently x) while the vector gives the 
change caused by the transition (i.e., the state becomes x + 
after the transition). Translating the previous discussion in 
those terms, we obtain 




-1 



(3) 



x.i — 1 

for the rate at which transitions occur and 



ifj >0 





-Soj 





-5ij -S2j 



f ifj>0 



(4) 



(r"^ has -2 at position -1 and everywhere else, and with 
j > has an additional -1 at position j) for the effect of such 
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FIG. 2. (Color online) Snapshots at different times (line styles) of the 
probability distributions for the number of infectious nodes in three 
configuration models (line weight and color). The on-the-fly process 
(CMOtF) and the configuration model allowing self-loops and re- 
peated links (CMA) both give the same results. Even in such a small 
network (TV = 30), forbidding self-loops and repeated links (CMP) 
has minimal effect. Each distribution have been obtained through 
10** Monte Carlo simulations. Degree sequence used: ni = 16, 
n2 = 8, 713 = 4 and = 2. All nodes are initially susceptible 
except for one infectious node of degree 1. 



transitions. We use /? = 1 without loss of generality (scaling 
of time unit). Equations (|2]l-(|4]i define the stochastic process 
of the configuration model generated on-the-fly (CMOtF). A 
similar approach |24| has been developed independently for 
the rigorous proof that a specific spreading model proposed 
by Volz fl5\ holds true in the limit of large network size. 



D. Comparison to numerical simulations 

Figure|2]is obtained through direct Monte Carlo simulations 
for a network of iV = 30 nodes. Results for CMA and CMOtF 
are essentially identical (i.e., the difference between them de- 
creases as inversed square root of number of Monte Carlo sim- 
ulations), in agreement with our claim that CMOtF exactly re- 
produces the behavior of CMA. The effect of forbidding self- 
loops and repeated links accounts for the slight difference be- 
tween results for CMF simulations and their CMOtF counter- 
parts. Larger system sizes decrease further these differences 
(N ~ 300 in Fig. [3]l and therefore CMOtFs become excellent 
approximations of CMFs. 

In terms of storage requirements, each CMOtF Monte Carlo 
simulation needs only to track the + 2 integers composing 
the state vector x. By comparison, a standard algorithm, first 
building the network then propagating the infections, must 
store the network structure as an adjacency list of Nz ele- 
ments, where z is the average degree. Since fc^^x ^ ^ for 
many networks of interest, the scaling of the memory require- 
ments much favors CMOtF for large N (e.g., N — 10^, z = 5 
and fc,nax = 100). 

Moreover, CMOtF will usually run faster than a standard 
algorithm since it does not need to generate the parts of the 
network that are not affected by the infection. Hence, if CMA 
requires time r^aM to generate the network and time Tspiead to 
perform the SI simulation, CMOtF will approximately require 



time pTbuiid + Tspread, whcre p € [0, 1] is the fraction of the 
links that only need to be allocated on-the-fly. At worst (p = 
1), the execution time will be similar 

For the sake of simplicity, the numerical algorithms for 
CMA, CMF and CMOtF were all presented in terms of in- 
finitesimal time intervals. While this perspective is closer to 
our analytical work, these algorithms may be translated to a 
Gillespie-like [261 form that is faster and exact (to numerical 
precision). Here is how this translation is done. 

In the case of CMA and CMF, the network construction 
is done as usual and the following algorithm is used. (/) Set 
each node as either susceptible or infectious according to the 
initial conditions. (//) For each infectious node, draw a ran- 
dom number At > from the probability density function 
^g-^At g^^jj jj-g susceptible neighbor, and assign to 
this neighbor a clock that will ring at time At. {Hi) Whenever 
a clock rings, check the state of the associated node. If it is 
susceptible, make the node infectious and proceed to step iv. 
If it is already infectious, ignore step iv and go to step v. (;V) 
For each susceptible neighbors of the newly infectious node, 
draw a random number At > from the probability density 
function (3 e~^^* and assign to this neighbor a clock that will 
ring at time t + At (where t is the current time), (v) Return to 
step /// until no clocks remain. 

In the case of CMOtF, the algorithm goes as follow. (/) 
Set X = x(0) (its initial condition) and t — 0. (//) Draw a 
random number At > from the probability density function 
/3A(x) e"'^'*''''-''^*. (Hi) Draw a random integer j > -1 such 
that j = -1 has probability (A(x) — l)/{x_i — 1) to occur 
while each j > occurs with probability jxj/{x_i — 1). (iv) 
Increment t by At and x by . (v) Return to step // until 
A(x) = 0. 



III. ASYMPTOTICALLY LARGE SYSTEMS 

A. Gaussian approximation 

The framework of a stochastic equation of the type defined 
by Eq. Q-Q offers the possibility of further simplification. 
Here also, the analytical tractability is a consequence of the 
reduction to a state vector of dimension fc,nax + 2 and perhaps 
the most significant advantage of our approach. As long as all 
elements of x(t) [and A(x(t))] are sufficiently "large", Eq. 
(|2]i can be approximated by a stochastic differential equation 
(see [23] §4.3.5) 

dx = a(x)rft + B(x) • dW, (5) 

where the vector W(t) is a Wiener process while vector a(x) 
and matrix -B(x) are given in terms of qj{x.) and as 

a.(x)=^r^q,(x), Bl(^^ri,/q^. (6) 

i 

An approximate solution x(t) w /x(t) + ^{t), composed of 
a deterministic term fi{t) and a stochastic perturbation i^{t), 
can be obtained when the noise term B{x) ■ dW is much 
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FIG. 3. (Color online) Probability distribution for the number of in- 
fectious nodes. The network is sufficiently large (A'^ — 300) for 
our asymptotic approximation to match the CM distributions around 
their peaks. Note that, especially at early times, the CMF results are 
very close to those of CMA and CMOtF (which is also an effect of 
larger network). 10* Monte Carlo simulations. Degree sequence: 
ni = 160, 712 ~ 80, na = 40 and 714 = 20. For each degree, 5% of 
the nodes are initially infectious. 



smaller than the deterministic term a{x}dt [which implies that 
the value of x{t) remains close to that of i^{t)]. Using the 
initial conditions /x(0) = x(0) and i/{0) = 0, the ordinary 
differential equation 



~dt 



= a(M)- 



(7) 



governs the deterministic contribution. The approximation 
— 1 « valid when remains large, gives 



dt 



-2A(m) 



dt 



(8) 



One way to solve this system is to introduce a "time param- 
eter" 



a:-i(0) 



such that 



d9 



a(m) 

^a:.i(0)- 



(9) 



We may then use Eq. (|9]) as a change of variable in Eq. ([SJ, 
replacing the "actual time" t by 9. Note that t — Q coiTesponds 
to 6* = 1 and that 6 decreases with time. The resulting dfjij/dO 
differential equations are much simpler with solutions 



li-i = a;_i(0)( 



a;fc(0)l 



(10) 



as a function of 6. These can then be used in Eq. (|9]l to obtain 
an ordinary differential equation depending on 9 alone 



dB 



"-max f r\\ 



k9 



(11) 



Solving for 61 as a function of t will then provide through 
Eq. ( [TO] l. This is in agreement with previous results based on 
an heterogeneous mean field approach Il25ll27ll28]| . The final 
state t ^ CO corresponds to the largest 9 G [0,1] such that 



The perturbation term 1/ can be obtained by solving the 
stochastic differential equation (see |23 1 §6.2) 



du = Ja(At) • i^dt + B{fj,) ■ dW, 



(12) 



where Jadj-) is the Jacobian matrix of a evaluated at //,. Initial 
conditions {v>{0)) — and cov(i'(0)) = give the solutions 
(v) = and (see l23] §4.4.9) 



(13) 



i/)=^*exp £j4f^{n)dt" ■B{iJi{t')) 



Since /i, and u contribute exclusively to the mean and co- 
variance of X, respectively, we obtain 



cov(x) = cov(i'). 



(14) 



Specifically, the mean number of infectious nodes is given 

by iV - Y^lzi A*fe while its variance is Efc°P=o [(^ov{v)]^^,. 
These values allow us to approximate the probability distribu- 
tion for the number of infectious nodes by a Gaussian distri- 
bution. 



B. Branching process approximation 



In section |III A| we have assumed that the probability distri- 
bution remains concentrated about its mean value. However, 
it is well known that this assumption is invalid when the initial 
condition contains a very small amount of infectious nodes. In 
fact, even if the parameters are such that the infection should 
initially grow on average, random events may cause an early 
end to the infection, thus splitting the probability distribution 
in two parts: small outbreaks and large scale epidemics. 

In order to consider such eventualities, we focus on the 
initial behavior of asymptotically large systems for which 
A(x(0)) ^ 2^-1(0). Since x does not change much during 
these early times, we may treat as a constant the probability 
Pk for a random node to be of degree k 



Pk 



XkiO) 



Ek' k'xk'iO)' 



The transition rates thus become 



gj (x) w /3A(x)jpj 



(15) 



(16) 



for events of type j > 0, and we may consider that events 
of type j = -1 do not occur. In this form, the problem can 
be viewed as a branching process: an infection event of type 
j > 1 directly causes j — 1 future infection events, the prob- 
ability for each of those future events to be of type j' being 
proportional to j'pj'- We define generations of infections as 
follow: the nodes which begin as infectious at time t — sae 
pait of generation 0, and generation n contains all the nodes 
that have been infected by nodes of generation n— 1. Although 
some nodes of generation n may be infected at an earlier time 
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than some nodes of generation n — 1, a higher generation usu- 
ally implies a later time of infection. 

Following previous work ||29l[30l . we model this branching 
process using probability generating functions (PGFs). For 
our purpose, we define a PGF as a power series whose coef- 
ficients are probabilities; see |31 1 for further details together 
with a more general perspective. A PGF generates its associ- 
ated sequence of coefficients. Hence, the PGF 



(17) 



generates the probability distribution for the degree of a ran- 
dom node, while the PGF 



k-l 



= 



.9^.(1) Ek' k'Pk' 



(18) 



generates the probability distribution for the excess degree of 
a node reached by following a random link ("excess" here 
means that the followed random link is excluded from the de- 
gree count). Alternatively, on may view the probability distri- 
bution generated by gi{^) as the number of infections of gen- 
eration n+1 that follow from a single infection of generation 
n. 

PGFs allow for formal and/or analytical treatment of the 
generated sequences under the form of functions, often sim- 
plifying both the notation and the calculations |29-31 1. For 
example, the composition gi (51(C)) generates the distribu- 
tion of the number of infections of generation n + 2 that 
follow from a single infection of generation n. Similarly, 
^51(^51(0) generates the total number of infections of gen- 
erations n,n + l and n + 2 that follow from a single infection 
of generation n (including that infection). The concept gen- 
eralizes to more than one variable: ^51(^51(0) generates — 
through C — the total number of infections of generation n 
and n+1 and — through ( — the number of infections from 
generation n + 2 that follows from a single infection of gen- 
eration n. 

As a slight generalization of the method presented in ll30l . 
we recursively introduce the two-variables PGFs 

/n(e,C)=e5l(/n-l(e,C)) With /o(e,C)=C (19) 

such that fn{£,, C) generates — through ^ — the total number 
of infections from generation 1 to n and — through ( — the 
number of infections of generation n + 1 that follow from a 
single infection of generation 1. Hence, for an initial condi- 
tion where all nodes are susceptible except for one randomly- 
chosen infectious node (generation 0), the PGF 



(20) 



generates — through ^ — the total number of infections from 
generation to n and — through C — the number of infections 
of generation n+1 that stem from these initial conditions; the 
results of ll30l corresponds to hn{£,, 1). More generally, for an 
initial condition containing Iq initially infectious nodes and 
Aq initially infectious stubs, the PGF becomes 



We now seek to distinguish small outbreaks from large 
scale epidemics: the infinite-size propagation process termi- 
nates during an outbreak while finite-size effects are required 
for an epidemic to end. In an infinite CM network [29|, the 
probability for a single infection event to cause a terminating 
chain of infections (i.e., it may cause infections that them- 
selves cause infections etc., but the total number of infections 
caused this way is finite) is given by the lowest u > satisfy- 



ing 



(22) 



Hence, w < 1 is the criteria for an epidemic to be possible. 
Noting rn the number of infectious nodes in generation n + 1, 
the infinite-size infection process will terminate if and only 
if each one of the corresponding m infection events causes a 
terminating chain of events; this occurs with probability u™. 
Therefore, the total number of infectious from generation to 
n that are part of outbreaks is generated by 



Ki^,u;Io, Ao) 



(23) 



in the general case [or by hn{£,,u) for a single random ini- 
tially infectious node] . Since any remaining case leads to an 
epidemic, the total number of infectious from generation to 
n that are part of epidemics is generated by 



Ki^, 1; Iq, Ao) - Kit u; Iq, Aq) 



(24) 



in the general case [or by ft,„(C,l) — hn{^,u) for a single 
random initially infectious node]. Since /„(l,w) = u for 
all n, one easily demonstrates that the total probability for an 
outbreak (or epidemic) is independent of the generation n. 

Extracting the generated distribution (coefficients) from a 
PGF may be done numerically through a Cauchy integral ||29l 
or, more efficiently, through a Fast Fourier Transform (FFT) 



C. Comparison to numerical simulations 



Applying the method of Sec. [Ill A| to the case studied in 
Fig. [3] shows that, although this Gaussian approximation of 
the exact dynamics [Eqs. (|2])-(|4])] assumes an asymptotically 
large system, Eq. ( [T4] i provides reasonable results for net- 
works as small as = 300. In other words, for N sufficiently 
large, we can follow for all times the first two moments (mean 
and variance) of the exact dynamics. This size-independent 
Gaussian distribution becomes the universal limit for the un- 
derlying finite-size propagation model. 

Part of this success is due to the fact that the initial condi- 
tion contains Ao = 26 infectious stubs (since for each degree 
5% of the nodes are infectious), a sufficiently large value to 
(almost) guarantee that an epidemic will occur. In fact, using 
the method of Sec. IIIB we find u « 0.6375 which implies 

-6 



(21) 



that the total probability for a small outbreak, « 8x10 
is very unlikely. This explains why the complete neglect of the 
influence of small outbreaks provides accurate results in this 
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FIG. 4. (Color online) Probability distribution for the number of in- 
fectious nodes in the limit t — > 00. Since the initial condition con- 
tains a small amount of infectious stubs (Aq — 1 and Ao — 4), the 
CMOtF probability distribution (plain curves) is roughly divided into 
two sub-distributions: small outbreaks and large scale epidemics. |(a)| 
While the separation between these sub-distributions is unclear for 
small networks (A^ — 300), |(b)| the distinction becomes sharper as 
the size increase (A'' — 3000). Analytical results (dashed curves) 
are obtained through branching processes (outbreaks) and Gaussian 
approximation (epidemics). Summing the contributions of these two 
limiting behaviors [doted curve, only visible in |(a)| around the 80 in- 
fectious nodes mark] is insufficient to obtain the correct distribution 
for the outbreaks of intermediary size. However, such intermedi- 
ate events gets less and less likely as the network size increase, thus 
making our two analytical distributions better approximations. In- 
sets: zoom on the distributions for few infectious nodes. 



Figure |4] investigates the behavior of the final distribution 
(t — > 00) when small outbreaks can not be neglected. Specifi- 
cally, a single initially infectious node of degree 1 (Aq = 1) or 
of degree 4 (Aq = 4) is used for the same network as in Fig. [3] 
[N = 300, Fig. 4(a)| and for one with the same degree distri- 
bution with ten times as many nodes [N = 3000, Fig. |4(b)| . 
The distinction between the two limiting behaviors (outbreaks 
and epidemics) becomes clearer as N increases. Further com- 
parisons may also be made with Fig. [2] for iV — 30 and 
Ao - 1. 

For the small outbreaks, the branching process method of 



Sec. Ill B provides the final distribution for the small compo- 
nents with hoo{£,, u; 1, Aq). These results are in good agree- 
ment with the numerical simulations for the small outbreaks, 
and increasing the network size improves this agreement. 
However, the same branching process method cannot be used 
to predict the probability distribution for the epidemics in the 



limit n 00: this distribution grows without bounds with n 
since finite-size effects are completely neglected. One result 
that does hold is that the total probability for an epidemic is 

1 - u^" . 

We also use the Gaussian approximation of Sec. Ill A to 
predict the shape of the probability distribution for the out- 
breaks, then weight the whole distribution with a factor 1 — 
M'^'' . As seen on Fig. [4] the results are again in good agreement 
with the numerical simulations, and increasing the network 
size improves this agreement. It should be noted that, a pri- 
ori, there was no guarantee for this simple approach to work: 
not only are the assumptions leading to the Gaussian approxi- 
mation not met, but also the propagation processes that have a 
number of infectious stubs below the average are more likely 
to end early (outbreaks) than those that are above average. 
This introduces a bias in the distribution for the epidemics. 
Nonetheless, the global shape of the final distribution is quite 
stable under such early perturbations. While the early behav- 
ior (and the initial conditions) is important for obtaining the 
total probability for epidemics, the final state of the epidemics 
is mainly governed by the finite-size effects. Combining the 
two methods thus provide a reliable estimate for the final dis- 
tribution of the epidemics. 

Although our analytical predictions are rather good, we sys- 
tematically underestimate the value of the distribution for in- 
termediate number of infections: the missing probabilities are 
being assigned to a larger number of infections. We may view 
such intermediate events as "small epidemics": they would 
have led to "real epidemics" in a larger network, but finite- 
size effects caused the propagation to stop earlier, leading to a 
number of infections that may be comparable to those of out- 
breaks. Increasing and/or Aq decreases the probability of 
these events, and therefore improves the quality of the results 
of our dual approach. 

Finally, even for large N, our Gaussian approximation for 
the distribution of epidemics shows systematic deviations: the 
distribution falls off faster than a Gaussian for large number of 
infections, and falls off slower for smaller-than-average epi- 
demics. This is due to the fact that the finite-size effects be- 
come noticeable faster than predicted by our linear approxi- 
mation [the Jacobian matrix Ja(M)]. Higher-order approxi- 
mations should improve the description. 



IV. CONCLUSION 

A. Generalization 

The approach presented in this contribution heavily relies 
on the fact that the SI dynamics can be expressed under a form 
where, for each link, we at most once need to simultaneously 
know the state of the two nodes joined by that link. In fact, we 
can generalize our exact approach to a vast class of systems for 
which this condition is respected. Indeed, given an arbitrary 
number of accessible node states (instead of "susceptible" and 
"infectious"), one could define a state vector x such that its 
elements track the number of nodes with k unassigned stubs 
for each accessible node state and for each possible value of 



7 



k. 

As a concrete example, a susceptible-infectious- removed 
(SIR) system, i.e., a susceptible-infectious where infectious 
nodes are removed at a constant probability rate, could be rep- 
resented by the state vector 

x= [xsQ xio xrq xsi xn xri xs2 ■■■]^ 

where xsk, xik and XRk stand for the number of suscepti- 
ble, infectious and removed nodes with k unassigned stubs, 
respectively |33|. Since the simultaneous knowledge of the 
state of two neighboring nodes is at most required once, we 
may perform on-the-fly neighbor assignment at the very time 
this knowledge is required, discarding the two stubs that were 
matched in the process. 

An earlier version of this work [5| has made possible a re- 
cent contribution |[8l which introduces a model for the deter- 
ministic (mean value) behavior of two interacting SIR pro- 
cesses taking place on two partially overlaying networks. 
Even though the dynamics is quite complicated, the on-the- 
fly perspective allows to accurately describe it at low compu- 
tational cost. In this case as in many others, an exact stochas- 
tic version of the model could be implemented using the ap- 
proach described in this article. 

B. Summary and perspective 

We have presented a procedure that allows the construc- 
tion of a network in a dynamical way on a need to know ba- 



sis. This slight change of perspective has profound implica- 
tions on the propagation dynamics on networks. It allows for 
a conceptual framework where the propagation is described 
exactly by a low-dimensional stochastic equation equivalent 
in all respects to the complete time evolution of the original 
problem. The low-dimensionality translates in large compu- 
tational gains and, most importantly, it allows for analytical 
results through the use of standard tools from stochastic cal- 
culus. Perhaps the simplest of these tools allowed us to ob- 
tain a Gaussian approximation of the distribution for all times 
which becomes exact in the large network limit. Another sim- 
ple tool, the branching process approach, allowed for a basic 
study of the bimodal behavior of the distribution (outbreaks 
and epidemics) that occurs when the initial condition does not 
guarantee a certain epidemic. Future contributions could im- 
prove the analytical description of intermediate events caused 
by early finite-size effects, and refine the distribution for the 
epidemics beyond the Gaussian assumption. Another inter- 
esting area of research concerns the application of the general 
method to other problems. Recent steps towards a general 
stochastic approach of the spreading dynamics on complex 
networks have already been taken Il34l . 
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